Proper Name Extraction from Non-Journalistic Texts

نویسندگان

Thierry Poibeau

Leila Kosseim

چکیده

This paper discusses the influence of the corpus on the automatic identification of proper names in texts. Techniques developed for the newswire genre are generally not sufficient to deal with larger corpora containing texts that do not follow strict writing constraints (for example, e-mail messages, transcriptions of oral conversations, etc). After a brief review of the research performed on news texts, we present some of the problems involved in the analysis of two different corpora: e-mails and hand-transcribed telephone conversations. Once the sources of errors have been presented, we then describe an approach to adapt a proper name extraction system developed for newspaper texts to the analysis of e-mail

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Textual Similarity based on Proper Names

Proper names represent about 10% of English or French newspaper articles. Their quantity and informational quality is already used in different Information Extraction systems. Proper names have widely been studied in the MUC conferences designed to promote research in Information Extraction. We have created our own named entity extraction tool based on a linguistic description with automata. Th...

متن کامل

Competition of Discourses in Journalistic Translation: Diplomatic Negotiations in Focus

We sought to understand whether, how, and why the translated journalistic texts related to the Iranian nuclear negotiations were manipulated. To this end, we monitored a news agency’s Webpage in a time span of 46 days that began 3 days before Almaty I nuclear talks and ended 3 days after Almaty II talks. Monitoring resulted in a corpus made up of 36 target texts p...

متن کامل

Multilingual corpora with coreferential annotation of person entities

This paper presents three corpora with coreferential annotation of person entities for Portuguese, Galician and Spanish. They contain coreference links between several types of pronouns (including elliptical, possessive, indefinite, demonstrative, relative and personal clitic and non-clitic pronouns) and nominal phrases (including proper nouns). Some statistics have been computed, showing distr...

متن کامل

From Academic to Journalistic Texts: A Qualitative Analysis of the Evaluative Language of Science

This study examined academic articles and journalistic reports in 5 disciplinary areas to explore how similar contents might attitudinally be realized in two different genres. To this end, 25 research articles and 210 news reports were carefully selected and underwent detailed discourse semantic and grammatical analyses with the purpose of identifying the evaluative linguistic patterns....

متن کامل

Resolución de Correferencia de Nombres de Persona para Extracción de Información Biográfica

Information extraction systems need a previous processing step in order to recognize coreferential elements, such as personal name variants. This paper has two aims: the first is to describe the main types of personal name coreference found in encyclopedic and journalistic texts in Spanish. Furthermore, we introduce an algorithm that solves most coreferential links between personal name variant...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Proper Name Extraction from Non-Journalistic Texts

نویسندگان

چکیده

منابع مشابه

Textual Similarity based on Proper Names

Competition of Discourses in Journalistic Translation: Diplomatic Negotiations in Focus

Multilingual corpora with coreferential annotation of person entities

From Academic to Journalistic Texts: A Qualitative Analysis of the Evaluative Language of Science

Resolución de Correferencia de Nombres de Persona para Extracción de Información Biográfica

عنوان ژورنال:

اشتراک گذاری